Locations performance improvements#14718
Open
dogboat wants to merge 47 commits intoDefectDojo:devfrom
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rework importing when using Locations to be not as slow.
Some changes are made to AbstractLocation and the URL model to support this. First, the
identity_hashattribute andset_identity_hash()method are moved from URL to AbstractLocation to support bulk operations across location types. AbstractLocation now callsset_identity_hash()as part of itsclean()routine. Finally, abulk_get_or_create()classmethod is added to AbstractLocation; it is responsible for (hopefully efficiently) bulk getting-or-creating an iterable of the appropriate type (e.g., URLs). It works by querying for existing locations based onidentity_hashvalues, and only creating new ones using Django'sbulk_create()method.Both LocationManager and EndpointManager have been changed. Of greatest impact, LocationManager has been updated to use the same accumulator pattern that EndpointManager was recently updated to use (thanks Val!). While doing this, I noticed the opportunity to consolidate things a bit; rather than maintaining the V3 if/else structure throughout, I standardized around the set of methods used by EndpointManager. At the same time, I inserted another level of indirection -- LocationHandler, which under the covers delegates to one or the other manager -- to help keep the two managers more in line going forward. The intent is, we won't be able to introduce one-off features for only Endpoints or Locations without at least considering how to handle the other. Plus, it removes a lot of the "#TODO: DELETE ME IN THE FUTURE MAYBE" comments right now. Additionally, LocationProductReference status is now kept in sync with finding-level status changes during reimport (if any finding ref is Active, the product ref is Active; otherwise Mitigated). This was a pre-existing gap.
The general process was, take the excellent pattern set by EndpointManager, and beat it until it worked with Locations too.
In my very unscientific testing of a 3k findings import, V2 finished in ~21s, while V3 finished in ~23s, putting Locations solidly within tickling distance of Endpoints.